Optimal Data Split Methodology for Model Validation

نویسندگان

  • Rebecca Morrison
  • Corey Bryant
  • Gabriel Terejanu
  • Kenji Miki
  • Serge Prudhomme
چکیده

The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question – how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the model be evaluated with respect to predictions of a given quantity of interest and its ability to reproduce the data, and 2) that the model be highly challenged by the validation set, assuming it is properly informed by the calibration set. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler’s and decisionmaker’s requirements. We also note that our framework is quite general, and may be applied to a wide range of problems. Here, we illustrate it through a specific example involving a data reduction model for an ICCD camera from a shock-tube experiment located at the NASA Ames Research Center (ARC).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal conditions for the biological removal of ammonia from wastewater of a petrochemical plant using the response surface methodology

High concentrations of nitrogen compounds, such as ammonia observed in the petrochemical industry, are the major environmental pollutants. Therefore, effective and inexpensive methods are needed for its treatment. Biological treatment of various pollutants is a low cost and biocompatible replacement for current physico-chemical systems. The use of aquatic plants is an effective way to absorb th...

متن کامل

On Optimal Data Split for Generalization Estimation and Model Selection

Modeling with flexible models, such as neural networks, requires careful control of the model complexity and generalization ability of the resulting model. Whereas general asymptotic estimators of generalization ability have been developed over recent years (e.g., [9]), it is widely acknowledged that in most modeling scenarios there isn't sufficient data available to reliably use these estimato...

متن کامل

Optimal Cross-Validation Split Ratio: Experimental Investigation

Cross-validation is a widespread method for assessing the generalisation ability of a model in order to tune a regularisation parameter or other hyper-parameters of a learning process. The use of cross-validation requires to set yet an additional parameter, the split ratio. Few texts have investigated theoretically the asymptotic setting of this ratio, and no consensus has emerged. In this cont...

متن کامل

Simulation and Model Validation of Batch PHB Production Process Using Ralstonia eutropha

Mathematical modeling and simulation of microbial Polyhydroxybutyrate (PHB) production process is beneficial for optimization, design, and control purposes. In this study a batch model developed by Mulchandani et al., [1] was used to simulate the process in MATLAB environment. It was revealed that the kinetic model parameters were estimated off the optimal or at a local optimal point. There...

متن کامل

Designing an Optimal Model for the Educational System in University of Applied Science and Technology

Purpose: The purpose of this study was to design a suitable model for the educational system in University of Applied Science and Technology. Methodology: This research was a qualitative-quantitative research. In the qualitative section of the research, the main components of the model for the educational system of this university have been identified by interviewing the experts. In the quantit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011